15 research outputs found
SurveyMan: Programming and Automatically Debugging Surveys
Surveys can be viewed as programs, complete with logic, control flow, and
bugs. Word choice or the order in which questions are asked can unintentionally
bias responses. Vague, confusing, or intrusive questions can cause respondents
to abandon a survey. Surveys can also have runtime errors: inattentive
respondents can taint results. This effect is especially problematic when
deploying surveys in uncontrolled settings, such as on the web or via
crowdsourcing platforms. Because the results of surveys drive business
decisions and inform scientific conclusions, it is crucial to make sure they
are correct.
We present SurveyMan, a system for designing, deploying, and automatically
debugging surveys. Survey authors write their surveys in a lightweight
domain-specific language aimed at end users. SurveyMan statically analyzes the
survey to provide feedback to survey authors before deployment. It then
compiles the survey into JavaScript and deploys it either to the web or a
crowdsourcing platform. SurveyMan's dynamic analyses automatically find survey
bugs, and control for the quality of responses. We evaluate SurveyMan's
algorithms analytically and empirically, demonstrating its effectiveness with
case studies of social science surveys conducted via Amazon's Mechanical Turk.Comment: Submitted version; accepted to OOPSLA 201
Recommended from our members
System Design for Digital Experimentation and Explanation Generation
Experimentation increasingly drives everyday decisions in modern life, as it is considered by some to be the gold standard for determining cause and effect within any system. Digital experiments have expanded the scope and frequency of experiments, which can range in complexity from classic A/B tests to contextual bandits experiments, which share features with reinforcement learning.
Although there exists a large body of prior work on estimating treatment effects using experiments, this prior work did not anticipate the new challenges and opportu- nities introduced by digital experimentation. Novel errors and threats to validity arise at the intersection of software and experimentation, especially when experimentation is in service of understanding humans behavior or autonomous black-box agents.
We present several novel tools for automating aspects of the experimentation- analysis pipeline. We propose new methods for evaluating online field experimentation, automatically generating corresponding analyses of treatment effects. We then draw the connection between software testing and experimental design and argue that applying software testing techniques to a kind of autonomous agent—a deep reinforcement learning agent—to demonstrate the need for novel testing paradigms when a software stack uses learned components that may have emergent behavior. We show how our system may be used to evaluate claims made about the behavior of autonomous agents and find that some claims do not hold up under test. Finally, we show how to produce explanations of the behavior of black-box software-defined agents interacting with white-box environments via automated experimentation. We show how an automated system can be used for exploratory data analysis, with a human in the loop, to investigate a large space of possible counterfactual explanations
3 years of liraglutide versus placebo for type 2 diabetes risk reduction and weight management in individuals with prediabetes: a randomised, double-blind trial
Background:
Liraglutide 3·0 mg was shown to reduce bodyweight and improve glucose metabolism after the 56-week period of this trial, one of four trials in the SCALE programme. In the 3-year assessment of the SCALE Obesity and Prediabetes trial we aimed to evaluate the proportion of individuals with prediabetes who were diagnosed with type 2 diabetes.
Methods:
In this randomised, double-blind, placebo-controlled trial, adults with prediabetes and a body-mass index of at least 30 kg/m2, or at least 27 kg/m2 with comorbidities, were randomised 2:1, using a telephone or web-based system, to once-daily subcutaneous liraglutide 3·0 mg or matched placebo, as an adjunct to a reduced-calorie diet and increased physical activity. Time to diabetes onset by 160 weeks was the primary outcome, evaluated in all randomised treated individuals with at least one post-baseline assessment. The trial was conducted at 191 clinical research sites in 27 countries and is registered with ClinicalTrials.gov, number NCT01272219.
Findings:
The study ran between June 1, 2011, and March 2, 2015. We randomly assigned 2254 patients to receive liraglutide (n=1505) or placebo (n=749). 1128 (50%) participants completed the study up to week 160, after withdrawal of 714 (47%) participants in the liraglutide group and 412 (55%) participants in the placebo group. By week 160, 26 (2%) of 1472 individuals in the liraglutide group versus 46 (6%) of 738 in the placebo group were diagnosed with diabetes while on treatment. The mean time from randomisation to diagnosis was 99 (SD 47) weeks for the 26 individuals in the liraglutide group versus 87 (47) weeks for the 46 individuals in the placebo group. Taking the different diagnosis frequencies between the treatment groups into account, the time to onset of diabetes over 160 weeks among all randomised individuals was 2·7 times longer with liraglutide than with placebo (95% CI 1·9 to 3·9, p<0·0001), corresponding with a hazard ratio of 0·21 (95% CI 0·13–0·34). Liraglutide induced greater weight loss than placebo at week 160 (–6·1 [SD 7·3] vs −1·9% [6·3]; estimated treatment difference −4·3%, 95% CI −4·9 to −3·7, p<0·0001). Serious adverse events were reported by 227 (15%) of 1501 randomised treated individuals in the liraglutide group versus 96 (13%) of 747 individuals in the placebo group.
Interpretation:
In this trial, we provide results for 3 years of treatment, with the limitation that withdrawn individuals were not followed up after discontinuation. Liraglutide 3·0 mg might provide health benefits in terms of reduced risk of diabetes in individuals with obesity and prediabetes.
Funding:
Novo Nordisk, Denmark
Creating Conversational Characters Using Question Generation Tools
This article describes a new tool for extracting question-answer pairs from text articles, and reports three experiments which investigate how suitable this technique is for supplying knowledge to conversational characters. Experiment 1 demonstrates the feasibility of our method by creating characters for 14 distinct topics and evaluating them using hand-authored questions. Experiment 2 evaluates three of these characters using questions collected from naive participants, showing that the generated characters provide full or partial answers to about half of the questions asked. Experiment 3 adds automatically extracted knowledge to an existing, hand-authored character, demonstrating that augmented characters can answer questions about new topics but with some degradation of the ability to answer questions about topics that the original character was trained to answer. Overall, the results show that question generation is a promising method for creating or augmenting a question answering conversational character using an existing text